31 research outputs found

    Approximating Approximate Pattern Matching

    Get PDF
    Given a text T of length n and a pattern P of length m, the approximate pattern matching problem asks for computation of a particular distance function between P and every m-substring of T. We consider a (1 +/- epsilon) multiplicative approximation variant of this problem, for l_p distance function. In this paper, we describe two (1+epsilon)-approximate algorithms with a runtime of O~(n/epsilon) for all (constant) non-negative values of p. For constant p >= 1 we show a deterministic (1+epsilon)-approximation algorithm. Previously, such run time was known only for the case of l_1 distance, by Gawrychowski and Uznanski [ICALP 2018] and only with a randomized algorithm. For constant 0 <= p <= 1 we show a randomized algorithm for the l_p, thereby providing a smooth tradeoff between algorithms of Kopelowitz and Porat [FOCS 2015, SOSA 2018] for Hamming distance (case of p=0) and of Gawrychowski and Uznanski for l_1 distance

    Brief Announcement: Hamming Distance Completeness and Sparse Matrix Multiplication

    Get PDF
    We show that a broad class of (+, diamond) vector products (for binary integer functions diamond) are equivalent under one-to-polylog reductions to the computation of the Hamming distance. Examples include: the dominance product, the threshold product and l_{2p+1} distances for constant p. Our results imply equivalence (up to poly log n factors) between complexity of computation of All Pairs: Hamming Distances, l_{2p+1} Distances, Dominance Products and Threshold Products. As a consequence, Yuster\u27s (SODA\u2709) algorithm improves not only Matousek\u27s (IPL\u2791), but also the results of Indyk, Lewenstein, Lipsky and Porat (ICALP\u2704) and Min, Kao and Zhu (COCOON\u2709). Furthermore, our reductions apply to the pattern matching setting, showing equivalence (up to poly log n factors) between pattern matching under Hamming Distance, l_{2p+1} Distance, Dominance Product and Threshold Product, with current best upperbounds due to results of Abrahamson (SICOMP\u2787), Amir and Farach (Ann. Math. Artif. Intell.\u2791), Atallah and Duket (IPL\u2711), Clifford, Clifford and Iliopoulous (CPM\u2705) and Amir, Lipsky, Porat and Umanski (CPM\u2705). The resulting algorithms for l_{2p+1} Pattern Matching and All Pairs l_{2p+1}, for 2p+1 = 3,5,7,... are new. Additionally, we show that the complexity of AllPairsHammingDistances (and thus of other aforementioned AllPairs- problems) is within poly log n from the time it takes to multiply matrices n x (n * d) and (n * d) x n, each with (n * d) non-zero entries. This means that the current upperbounds by Yuster (SODA\u2709) cannot be improved without improving the sparse matrix multiplication algorithm by Yuster and Zwick (ACM TALG\u2705) and vice versa

    Brief Announcement: Energy Constrained Depth First Search

    Get PDF
    Depth first search is a natural algorithmic technique for constructing a closed route that visits all vertices of a graph. The length of such route equals, in an edge-weighted tree, twice the total weight of all edges of the tree and this is asymptotically optimal over all exploration strategies. This paper considers a variant of such search strategies where the length of each route is bounded by a positive integer B (e.g. due to limited energy resources of the searcher). The objective is to cover all the edges of a tree T using the minimum number of routes, each starting and ending at the root and each being of length at most B. To this end, we analyze the following natural greedy tree traversal process that is based on decomposing a depth first search traversal into a sequence of limited length routes. Given any arbitrary depth first search traversal R of the tree T, we cover R with routes R_1,...,R_l, each of length at most B such that: R_i starts at the root, reaches directly the farthest point of R visited by R_{i-1}, then R_i continues along the path R as far as possible, and finally R_i returns to the root. We call the above algorithm piecemeal-DFS and we prove that it achieves the asymptotically minimal number of routes l, regardless of the choice of R. Our analysis also shows that the total length of the traversal (and thus the traversal time) of piecemeal-DFS is asymptotically minimum over all energy-constrained exploration strategies. The fact that R can be chosen arbitrarily means that the exploration strategy can be constructed in an online fashion when the input tree T is not known in advance. Each route R_i can be constructed without any knowledge of the yet unvisited part of T. Surprisingly, our results show that depth first search is efficient for energy constrained exploration of trees, even though it is known that the same does not hold for energy constrained exploration of arbitrary graphs

    Improved Analysis of Deterministic Load-Balancing Schemes

    Full text link
    We consider the problem of deterministic load balancing of tokens in the discrete model. A set of nn processors is connected into a dd-regular undirected network. In every time step, each processor exchanges some of its tokens with each of its neighbors in the network. The goal is to minimize the discrepancy between the number of tokens on the most-loaded and the least-loaded processor as quickly as possible. Rabani et al. (1998) present a general technique for the analysis of a wide class of discrete load balancing algorithms. Their approach is to characterize the deviation between the actual loads of a discrete balancing algorithm with the distribution generated by a related Markov chain. The Markov chain can also be regarded as the underlying model of a continuous diffusion algorithm. Rabani et al. showed that after time T=O(log⁥(Kn)/ÎŒ)T = O(\log (Kn)/\mu), any algorithm of their class achieves a discrepancy of O(dlog⁥n/ÎŒ)O(d\log n/\mu), where ÎŒ\mu is the spectral gap of the transition matrix of the graph, and KK is the initial load discrepancy in the system. In this work we identify some natural additional conditions on deterministic balancing algorithms, resulting in a class of algorithms reaching a smaller discrepancy. This class contains well-known algorithms, eg., the Rotor-Router. Specifically, we introduce the notion of cumulatively fair load-balancing algorithms where in any interval of consecutive time steps, the total number of tokens sent out over an edge by a node is the same (up to constants) for all adjacent edges. We prove that algorithms which are cumulatively fair and where every node retains a sufficient part of its load in each step, achieve a discrepancy of O(min⁥{dlog⁥n/ÎŒ,dn})O(\min\{d\sqrt{\log n/\mu},d\sqrt{n}\}) in time O(T)O(T). We also show that in general neither of these assumptions may be omitted without increasing discrepancy. We then show by a combinatorial potential reduction argument that any cumulatively fair scheme satisfying some additional assumptions achieves a discrepancy of O(d)O(d) almost as quickly as the continuous diffusion process. This positive result applies to some of the simplest and most natural discrete load balancing schemes.Comment: minor corrections; updated literature overvie

    A Framework for Searching in Graphs in the Presence of Errors

    Get PDF
    We consider a problem of searching for an unknown target vertex t in a (possibly edge-weighted) graph. Each vertex-query points to a vertex v and the response either admits that v is the target or provides any neighbor s of v that lies on a shortest path from v to t. This model has been introduced for trees by Onak and Parys [FOCS 2006] and for general graphs by Emamjomeh-Zadeh et al. [STOC 2016]. In the latter, the authors provide algorithms for the error-less case and for the independent noise model (where each query independently receives an erroneous answer with known probability p<1/2 and a correct one with probability 1-p). We study this problem both with adversarial errors and independent noise models. First, we show an algorithm that needs at most (log_2 n)/(1 - H(r)) queries in case of adversarial errors, where the adversary is bounded with its rate of errors by a known constant r<1/2. Our algorithm is in fact a simplification of previous work, and our refinement lies in invoking an amortization argument. We then show that our algorithm coupled with a Chernoff bound argument leads to a simpler algorithm for the independent noise model and has a query complexity that is both simpler and asymptotically better than the one of Emamjomeh-Zadeh et al. [STOC 2016]. Our approach has a wide range of applications. First, it improves and simplifies the Robust Interactive Learning framework proposed by Emamjomeh-Zadeh and Kempe [NIPS 2017]. Secondly, performing analogous analysis for edge-queries (where a query to an edge e returns its endpoint that is closer to the target) we actually recover (as a special case) a noisy binary search algorithm that is asymptotically optimal, matching the complexity of Feige et al. [SIAM J. Comput. 1994]. Thirdly, we improve and simplify upon an algorithm for searching of unbounded domains due to Aslam and Dhagat [STOC 1991]

    Point-to-point and congestion bandwidth estimation: experimental evaluation on PlanetLab

    Get PDF
    In large scale Internet platforms, measuring the available bandwidth between nodes of the platform is difficult and costly. However, having access to this information allows to design clever algorithms to optimize resource usage for some collective communications, like broadcasting a message or organizing master/slave computations. In this paper, we analyze the feasibility to provide estimations, based on a limited number of measurements, for the point-to-point available bandwidth values, and for the congestion which happens when several communications take place at the same time. We present a dataset obtained with both types of measurements performed on a set of nodes from the PlanetLab platform. We show that matrix factorization techniques are quite efficient at predicting point-to-point available bandwidth, but are not adapted for congestion analysis. However, a LastMile modeling of the platform allows to perform congestion predictions with a reasonable level of accuracy, even with a small amount of information, despite the variability of the measured platform

    Bedibe: Datasets and Software Tools for Distributed Bandwidth Prediction

    Get PDF
    National audiencePouvoir prédire la bande passante disponible est une problématique cruciale pour un grand nombre d'applications distribuées sur Internet. Plusieurs solutions ont été proposées, mais l'absence d'implémentations communes et de jeux de données reconnus rend difficile la comparaison et la reproductibilité des résultats. Dans cet article, nous présentons bedibe, la combinaison de mesures de bande passante effectuées sur Planet-Lab et d'un logiciel pour faciliter l'écriture et l'étude d'algorithmes pour la prédiction de bande passante. bedibe inclut les implémentations des meilleures solutions de la littérature, et a pour but de faciliter la comparaison des résultats obtenus par les différentes équipes qui travaillent sur ce thÚme

    Tight Tradeoffs for Real-Time Approximation of Longest Palindromes in Streams

    Get PDF
    We consider computing a longest palindrome in the streaming model, where the symbols arrive one-by-one and we do not have random access to the input. While computing the answer exactly using sublinear space is not possible in such a setting, one can still hope for a good approximation guarantee. Our contribution is twofold. First, we provide lower bounds on the space requirements for randomized approximation algorithms processing inputs of length n. We rule out Las Vegas algorithms, as they cannot achieve sublinear space complexity. For Monte Carlo algorithms, we prove a lower bounds of Omega(M log min {|Sigma|, M}) bits of memory; here M=n/E for approximating the answer with additive error E, and M= log n / log (1 + epsilon) for approximating the answer with multiplicative error (1 + epsilon). Second, we design three real-time algorithms for this problem. Our Monte Carlo approximation algorithms for both additive and multiplicative versions of the problem use O(M) words of memory. Thus the obtained lower bounds are asymptotically tight up to a logarithmic factor. The third algorithm is deterministic and finds a longest palindrome exactly if it is short. This algorithm can be run in parallel with a Monte Carlo algorithm to obtain better results in practice. Overall, both the time and space complexity of finding a longest palindrome in a stream are essentially settled

    RLE Edit Distance in Near Optimal Time

    Get PDF
    We show that the edit distance between two run-length encoded strings of compressed lengths m and n respectively, can be computed in O(mn log(mn)) time. This improves the previous record by a factor of O(n/log(mn)). The running time of our algorithm is within subpolynomial factors of being optimal, subject to the standard SETH-hardness assumption. This effectively closes a line of algorithmic research first started in 1993

    Approximation Strategies for Generalized Binary Search in Weighted Trees

    Get PDF
    International audienceWe consider the following generalization of the binary search problem. A search strategy is required to locate an unknown target node tt in a given tree TT. Upon querying a node vv of the tree, the strategy receives as a reply an indication of the connected component of T∖{v}T\setminus\{v\} containing the target tt. The cost of querying each node is given by a known non-negative weight function, and the considered objective is to minimize the total query cost for a worst-case choice of the target. Designing an optimal strategy for a weighted tree search instance is known to be strongly NP-hard, in contrast to the unweighted variant of the problem which can be solved optimally in linear time. Here, we show that weighted tree search admits a quasi-polynomial time approximation scheme: for any 0<Δ<10 < \varepsilon < 1, there exists a (1+Δ)(1+\varepsilon)-approximation strategy with a computation time of nO(log⁥n/Δ2)n^{O(\log n / \varepsilon^2)}. Thus, the problem is not APX-hard, unless NP⊆DTIME(nO(log⁥n))NP \subseteq DTIME(n^{O(\log n)}). By applying a generic reduction, we obtain as a corollary that the studied problem admits a polynomial-time O(log⁥n)O(\sqrt{\log n})-approximation. This improves previous O^(log⁥n)\hat O(\log n)-approximation approaches, where the O^\hat O-notation disregards O(polylog⁥log⁥n)O(\mathrm{poly}\log\log n)-factors
    corecore